GECCO 2004 CD-ROM (LNCS 3102)

Identification of Informative Genes for Molecular Classification Using Probabilistic Model Building Genetic Algorithm

Topon Kumar Paul and Hitoshi Iba

Graduate School of Frontier Sciences, The University of Tokyo, Kashiwanoha 5-1-5, Kashiwa-shi, Chiba 277-8561, Japan
topon@iba.k.u-tokyo.ac.jp
iba@iba.k.u-tokyo.ac.jp

Abstract. DNA microarray allows the monitoring and measurement of the expression levels of thousands of genes simultaneously in an organism. A systematic and computational analysis of this vast amount of data provides understanding and insight into many aspects of biological processes. Recently, there has been a growing interest in classification of patient samples based on these gene expressions. The main challenge here is the overwhelming number of genes relative to the number of available training samples in the data set, and many of these genes are irrelevant for classification and have negative effect on the accuracy of the classifier. The choice of genes affects several aspects of classification: accuracy, required learning time, cost, and number of training samples needed. In this paper, we propose a new Probabilistic Model Building Genetic Algorithm (PMBGA) for the identification of informative genes for molecular classification and present our unbiased experimental results on three bench-mark data sets.

LNCS 3102, p. 414 ff.

Full article in PDF